Behavioral Analysis of Benzoquinone Defensive Gland Usage by Bug

(c) 2020 Julian Wagner. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license. Code for non-parametric bootstrapping of linear regression slopes from work by Justin Bois.

In [2]:
import numpy as np
import pandas as pd
import bokeh
import bokeh.io
import bokeh.plotting
import bebi103
import bokeh_catplot
import re
bokeh.io.output_notebook()
Loading BokehJS ...

Study setup

Liometopum occidentale is an ecologically dominant ant of oak forests in the Southwest United States and Mexico. It has numerous symbionts, including several species of beetle, a cricket, as well as hemipterans. Pamillia behrensii is one associate of the L. occidentale. P. behrensii has a metathoracic gland with classic defensive compounds including benzoquinones, a potent irritant. To test whether the bug uses it's gland secretion for defense, we performed behavioral experiments coupled with solid phase micro-extraction and mass spectrometry to assess whether aggressive interactions with ants elicit gland usage. Here, we measured whether the amount of bug movement over the course of a behavior trial (a proxy for attempts to escape/avoid a threatening stimulus) correlates with amount of gland secretion deployed.

Pamillia Pamillia

Pamillia behrensii. The metathoracic gland can be seen on the underside of the bug, between second and third set of legs; it is the white curved structure.

Behavioral analysis

To assess gland deployment, we employed an infrared illuminated arena, and placed a bug with either 5 ants, 5 ants with glued mandibles (to reduce their aggression/potency against the bug), or alone. We chilled the animals on ice and loaded them into the arena. The trials were run for 30 minutes with SPME fiber deployed, before the SPME fiber was injected into a GCMS. To start the analysis, we begin by loading up metadata on the behavioral videos into a dataframe.

In [3]:
metadata = pd.read_csv("./pamillia_dlc_analysis/pamillia_spme_metadata.csv", index_col=0)
metadata.head()
Out[3]:
species ID date SPME_start_time SPME_stop_time SPME_total_time interactor order heptan-2-one heptyl acetate 2-ethyl-1-4-benzoquinone 2-ethyl-1-4-hydroquinone DLC_file x1_crop x2_crop pixPerMm timestamp_file
0 pamillia 1 20191107 59 620 561 5_locc 1 12138 484 10548 5249 ./pamillia_dlc_analysis/pamillia_1_cam_1_date_... 160 1230 26.750 ./pamillia_dlc_analysis/pamillia_1_cam_1_date_...
1 pamillia 1 20191107 56 1813 1757 5_locc 2 714989962 2887643 115947543 171544 ./pamillia_dlc_analysis/pamillia_1_round2_cam_... 150 1230 27.000 ./pamillia_dlc_analysis/pamillia_1_round2_cam_...
2 pamillia 2 20191107 69 1914 1845 5_locc 1 46007517 485025 1917440 628632 ./pamillia_dlc_analysis/pamillia_2_cam_1_date_... 150 1225 26.875 ./pamillia_dlc_analysis/pamillia_2_cam_1_date_...
3 pamillia 2 20191107 37 1941 1904 5_locc 2 1358301 197711 216459 83717 ./pamillia_dlc_analysis/pamillia_2_round2_cam_... 150 1225 26.875 ./pamillia_dlc_analysis/pamillia_2_round2_cam_...
4 pamillia 3 20191112 46 1817 1771 5_locc 1 48541 79062 85775 87096 ./pamillia_dlc_analysis/pamillia_3_round1_cam_... 150 1240 27.250 ./pamillia_dlc_analysis/pamillia_3_round1_cam_...

Peaks from the GCMS run were manually integrated for the four identified gland compounds to get total counts, a measure for total amount of each compound from the gland secretion. In order to look at the behavior of the bug, namely its movement over time, we used DeepLabCut (DLC). Markers for the head, two places on the thorax and two on the abdomen were manually annotated on 1033 frames from a variety of of videos. Some images were annotated as refinement of the trained network after identification of poorly labeled frames. The behavioral videos and the results of the annotation with DLC look as follows:

Behavioral data frame DLC annotated data frame

We can take a look at the chemical data to see amounts of gland compounds deployed in our different treatment conditions (5 ants [5_locc], 5 glued-mandible ants [5_locc_glued], or bug alone [alone]). We plot both raw $\mathrm{count}$ data as well as $log(\mathrm{count})$ data due to the large scale of the axis.

In [4]:
metadata['log heptan-2-one'] = np.log(metadata['heptan-2-one'])
metadata['log heptyl acetate'] = np.log(metadata['heptyl acetate'])
metadata['log 2-ethyl-1-4-benzoquinone'] = np.log(metadata['2-ethyl-1-4-benzoquinone'])
metadata['log 2-ethyl-1-4-hydroquinone'] = np.log(metadata['2-ethyl-1-4-hydroquinone'])

plots = []
for chem in ['heptan-2-one', 'heptyl acetate', '2-ethyl-1-4-benzoquinone', '2-ethyl-1-4-hydroquinone',
             'log heptan-2-one', 'log heptyl acetate', 'log 2-ethyl-1-4-benzoquinone', 'log 2-ethyl-1-4-hydroquinone',]:
    p = bokeh_catplot.strip(
        data=metadata.loc[metadata['SPME_total_time']>1600],
        cats='interactor',
        val=chem,
        jitter=True,
        horizontal=True,
        marker_kwargs=dict(alpha=0.5, size=8),plot_height=150,palette=bokeh.palettes.viridis(3), plot_width=500,title='Counts from integrated GCMS peak',
    )
    plots.append(p)

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

We can see quite a bit of variability between trials, but a generally very high quantity of gland compound deployed when the bug interacted with ants, whereas a small/GCMS baseline amount present when the bug was alone in the arena. This indicates the bug actively uses the gland, rather than the compounds continuously diffusing from the gland. The stark difference in the distribution of amount of chemical used when the big faces an enemy vs is alone is more clearly seen with an empirical cumulative distribution of the data.

In [5]:
plots = []
for chem in ['heptan-2-one', 'heptyl acetate', '2-ethyl-1-4-benzoquinone', '2-ethyl-1-4-hydroquinone']:
    p = bokeh_catplot.ecdf(
        data=metadata.loc[metadata['SPME_total_time']>1600],
        cats='interactor',
        val=chem,
        marker_kwargs=dict(alpha=1), style='staircase', plot_width=500,palette=bokeh.palettes.viridis(3),title='Counts from integrated GCMS peak'
    )
    p.legend.location = 'bottom_right'
    plots.append(p)

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

The distributions look pretty similar for the ant and glued ant conditions, and compound amount near to zero for the big alone. We can now look at whether the level of movement of the bug, a proxy for how much it engages in an escape response or how agitated it is, correlates with how much chemical it secreted from its gland during the trial. We define some helper functions to read in the DLC data and filter it.

In [6]:
def read_files(DLC_file, time_file, network, start, stop, SPME_time_buffer=20):
    df_pos = pd.read_hdf(DLC_file)
    df_pos_time = pd.read_csv(time_file)
    df_pos[(network, 'all', 't')] = df_pos_time['time']
    
    df_pos_med = df_pos.rolling(5).median()
    df_pos_part_med_x = np.median(df_pos_med.dropna()[[(network, 'bh', 'x'), (network, 'bt1', 'x'), (network, 'bt2', 'x'), (network, 'ba1', 'x'), (network, 'ba2', 'x')]], axis=1)
    df_pos_part_med_y = np.median(df_pos_med.dropna()[[(network, 'bh', 'y'), (network, 'bt1', 'y'), (network, 'bt2', 'y'), (network, 'ba1', 'y'), (network, 'ba2', 'y')]], axis=1)
    
    df_tidy = pd.DataFrame({'t':df_pos_med.dropna()[(network, 'all', 't')], 'x':df_pos_part_med_x, 'y':df_pos_part_med_y})
    if stop == 'full':
        df_tidy = df_tidy.loc[df_tidy['t']>start]
    else:
        df_tidy = df_tidy.loc[(df_tidy['t']>start) & (df_tidy['t']<stop)]
    return(df_tidy)

def load_DLC_medians(metadata, ind, SPME_time_buffer=20):
    
    DLC_file = metadata.iloc[ind]['DLC_file']
    time_file = metadata.iloc[ind]['timestamp_file']
    
    if ';' in metadata.iloc[ind]['DLC_file']:
        file1, file2 = re.match('([^;]+); *([^;]+)', DLC_file).groups()
        file1_time, file2_time = re.match('([^;]+); *([^;]+)', time_file).groups()
        network = re.search('(DLC.*).h5', file1)[1]
        start, stop = (metadata.iloc[ind]['SPME_start_time'], metadata.iloc[ind]['SPME_stop_time'])
        df1 = read_files(file1, file1_time, network, start, 'full')
        df2 = read_files(file2, file2_time, network, 0, stop)
        df2['t'] = df2['t'] + df1['t'].max()
        df_tidy = pd.concat([df1, df2])
    else:    
        network = re.search('(DLC.*).h5', DLC_file)[1]
        start, stop = (metadata.iloc[ind]['SPME_start_time'], metadata.iloc[ind]['SPME_stop_time'])
        df_tidy = read_files(DLC_file, time_file, network, start, stop)
    
    return(df_tidy)

With these functions ready, we can iterate through the the data and read in the position information of the bug in the arena over time. Note that there were some preliminary experiments that were shorter than 30 mins. With SPME, if the trial is not run for a comparable length of time, the chemical amounts cannot be directly compared. Hence, these short trials were excluded from the following analysis.

In [7]:
cols = {'5_locc':bokeh.palettes.viridis(3)[0], '5_locc_glued':bokeh.palettes.viridis(3)[1], 'alone':bokeh.palettes.viridis(3)[2]}
df_t_list, colors, interactors, IDs, BQs, pixPerMms, trialOrder = ([], [], [], [], [], [], [])
for i in range(len(metadata)):
    if metadata.iloc[i]['SPME_total_time'] < 1500:
        continue
    df_t_list.append(load_DLC_medians(metadata, i))
    colors.append(cols[metadata.iloc[i]['interactor']])
    interactors.append(metadata.iloc[i]['interactor'])
    IDs.append(metadata.iloc[i]['ID'])
    BQs.append(metadata.iloc[i]['2-ethyl-1-4-benzoquinone'])
    pixPerMms.append(metadata.iloc[i]['pixPerMm'])
    trialOrder.append(metadata.iloc[i]['order'])

We can now check on what the positional data looks like. For this, we will plot the x-position of a bug over time. Each line is a different experiment, with the bug ID listed on the left. The y-axis represents the position in the arena over time, so movement up or down in that axis represents movement across the arena. The positional values are shifted and scaled to show all the trials.

In [8]:
p = bokeh.plotting.figure(plot_width=1500, x_axis_label='time (min)')
p.yaxis.visible=False
for i, df_t in enumerate(df_t_list):
    p.line((df_t['t']-df_t['t'].min())/60, 1.5*IDs[i]+trialOrder[i]/4+df_t['x']/6000, color=colors[i], legend_label=interactors[i], line_width=3)
    if trialOrder[i] == 1 or IDs[i]==1:
        mytext = bokeh.models.Label(x=-0.8, y=1.5*IDs[i]+trialOrder[i]/4, text=str(IDs[i]), text_align='center', text_font_size='18pt')
        p.add_layout(mytext)
        
mytext = bokeh.models.Label(x=-0.8, y=1.5*IDs[i]+trialOrder[i]/4, text='ID#', text_align='center', text_font_size='12pt')
p.add_layout(mytext)
p.legend.location='top_right'
bokeh.io.show(p)

We can see that there is quite a bit of variability in how much the bug moves in the various experiments. The more the line squiggle in the line, the more the animal is moving. Note that in some trials the movement trace rapidly oscillates up and down, which indicates that the bug is running around the edge of the circular arena (which is classically considered a stress response). For the next step, we will collapse all this movement data into a single number per trial, the distanced traversed by the animal over the 30 minute experiment. Though it loses a lot of the information, it gives one readout for how agitated the bug was during the trial. In order to get a statistical assessment of whether there is a correlation of gland chemical deployment with bug distance traveled, we first define a function to do a non-parametric bootstrap of the linear regression slope/intercept for the data. Due to the large magnitude/range of values for chemical amount and bug movement, we perform the regression on log transformed data.

In [9]:
def draw_bs_pairs_linreg(x, y, size=1):
    """Perform pairs bootstrap for linear regression."""
    # Set up array of indices to sample from
    inds = np.arange(len(x))

    # Initialize samples
    bs_slope_reps = np.empty(size)
    bs_intercept_reps = np.empty(size)

    # Take samples
    for i in range(size):
        bs_inds = np.random.choice(inds, len(inds))
        bs_x, bs_y = x[bs_inds], y[bs_inds]
        bs_slope_reps[i], bs_intercept_reps[i] = np.polyfit(bs_x, bs_y, deg=1)

    return bs_slope_reps, bs_intercept_reps

With this function ready, we can make a plot with a confidence interval for the slope of amount of chemical deployed vs distance of bug movement.

In [10]:
x_move = []
y_bq = []
for i, df_t, color, interactor, ID, BQ, pixPerMm in zip(range(len(df_t_list)), df_t_list, colors, interactors, IDs, BQs, pixPerMms):
    move = np.sqrt(np.square(df_t.shift()['x'] - df_t['x']) + np.square(df_t.shift()['y'] - df_t['y'])).sum()
    move = move/(pixPerMm*10)
    x_move.append(np.log(move))
    y_bq.append(np.log(BQ))

x_move = np.array(x_move)
y_bq = np.array(y_bq)
bs_slope_reps, bs_intercept_reps = draw_bs_pairs_linreg(x_move, y_bq, size=10000)

p = bokeh.plotting.figure(plot_height=400, plot_width=500, x_axis_label='log(cm traveled by bug during trial)', y_axis_label='log(BQ counts from GCMS of SPME)')
p.title.text = 'Slope 95% CI: ' + str([round(j, 3) for j in np.percentile(bs_slope_reps, [2.5, 97.5])])

# x-values
x = np.linspace(x_move.min(), x_move.max(), 200)
# y-values of each point
y = np.outer(bs_slope_reps, x) + np.stack([bs_intercept_reps]*200, axis=1)

# Compute percentiles for [2.5th and 97.5th], [12.5th and 87.5th], and [45.0th and 55.0th]
for percents in [[2.5, 97.5], [12.5, 87.5], [45.0, 55.0]]:
    low, high = np.percentile(y, percents, axis=0)
    p1 = np.append(x, x[::-1])
    p2 = np.append(low, high[::-1])
    p.patch(p1, p2, alpha=0.5, color='grey', line_color='grey')

p.circle(x_move, y_bq, color=colors, size=20)

p.output_backend = 'svg'
bokeh.io.show(p)

Note that the gray intervals show the 10th, 75th, and 95th percentiles for a regression of data points generated via a non-parametric bootstrap. There is a lot of variability between trials in both the amount of chemical deployed and the movement of the insect! Even so, there is a significant positive association between amount of bug movement (a proxy for agitation/attempts to escape an enemy) and amount of gland chemical deployed. This gives a very strong indication that gland compound secretion is actively controlled by the bug and used in defense to help the animal escape a would-be predator or aggressor. One large confound for the movement measurement are cases where the ant is able to successfully grab/hold the bug. In such cases, the bug will not be able to run (reducing the movement) while it may still deploy the gland. Additionally, the movement measure does not directly capture the level of aggressiveness of the ants, but instead is a measure of the attempted escape/agitation of the bug. Correlation of some other behavioral metrics, like how much time spent close to ants or how much the ants moved during th experiment, with amount of chemical deployed would provide additional insight into the bug's pattern of usage of it's defensive gland.